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Abstract 

We present simple algorithms for achieving self-stabilizing location management and routing in mobile 
ad-hoc networks. While mobile clients may be susceptible to corruption and stopping failures, mobile 
networks are often deployed with a reliable GPS oracle, supplying frequent updates of accurate real time 
and location information to mobile nodes. Information from a GPS oracle provides an external, shared 
source of consistency for mobile nodes, allowing them to label and timestamp messages, and hence aiding 
in identification of, and eventual recovery from, corruption and failures. Our algorithms use a GPS oracle. 

Our algorithms also take advantage of the Virtual Stationary Automata programming abstraction, 
consisting of mobile clients, virtual timed machines called virtual stationary automata (VSAs), and a 
local broadcast service connecting VSAs and mobile clients. VSAs are distributed at known locations 
over the plane, and emulated in a self-stabilizing manner by the mobile nodes in the system. They serve 
as fault-tolerant building blocks that can interact with mobile clients and each other, and can simplify 
implementations of services in mobile networks. 

We implement three self-stabilizing, fault-tolerant services, each built on the prior services: (1) VSA- 
to-VSA geographic routing, (2) mobile client location management, and (3) mobile client end-to-end 
routing. We use a greedy version of the classical depth-first search algorithm to route messages between 
VSAs in different regions. The mobile client location management service is based on home locations: 
Each client identifier hashes to a set of home locations, regions whose VSAs are periodically updated with 
the client's location. VSAs maintain this information and answer queries for client locations. Finally, the 
VSA-to-VSA routing and location management services are used to implement mobile client end-to-end 
routing. 

Keywords: virtual infrastructure, location management, home locations, end-to-end routing, hash func- 
tions, self-stabilization, GPS oracle 
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1 Introduction 

A system with no fixed infrastructure in which mobile clients may wander in the plane and assist each 
other in forwarding messages is called an ad-hoc network. The task of designing algorithms for constantly 
changing networks is difficult. Highly dynamic networks, however, are becoming increasingly prevalent, 
especially in the context of pervasive and ubiquitous computing, and it is therefore important to develop 
and use techniques that simplify this task. In addition, mobile nodes in these networks may suffer from 
crash failures or corruption faults, which cause arbitrary changes to their program states. Self-stabilization 
[4, 5] is the ability to recover from an arbitrarily corrupt state. This property is important in long-lived, 
chaotic systems where certain events can result in unpredictable faults. For example, transient interference 
may disrupt wireless communication, violating our assumptions about the broadcast medium. 

Mobile networks are often deployed in conjunction with "reliable" GPS services, supplying frequent 
updates of real time and region information to mobile nodes. While the mobile clients may be susceptible to 
corruption and stopping failures, the GPS service may not be. Each of our algorithms utilizes such a reliable 
GPS oracle. Information from this oracle provides an external, shared source of consistency for mobile nodes, 
allowing them to label and timestamp their messages, and hence, aiding in identification of, and recovery 
from, corruption and stopping failures. 

In this paper we describe self-stabilizing algorithms that use a reliable GPS oracle to provide geographic 
routing, a mobile client location management service, and a mobile client end-to-end routing service. Each 
service is built on the prior services such that the composition of the services remains self-stabilizing [11]. 
In order to route location information between geographic regions, we use a greedy version of the classical 
depth-first search algorithm. This service is then used to help implement the location management service; 
each mobile client identifier hashes to a set of home locations, geographical regions that are periodically 
updated with the location of the client, and that are responsible for then answering queries about the 
client's location. Both of these services are then used to implement point-to-point routing between mobile 
clients in the network. 

In order to simplify the implementations of the location management and routing services, we mask the 
unpredictable behavior of mobile nodes by using a self-stabilizing virtual infrastructure, consisting of mobile 
client automata, timing-aware and location-aware machines at fixed locations, called Virtual Stationary 
Automata (VSAs) [8, 9], that mobile clients can interact with and use to coordinate their actions, and a local 
broadcast service connecting VSAs and mobile clients. 

Self-stabilization and GPS oracles. Traditionally, studies of self-stabilizing systems are concerned with 
those systems that can be started from arbitrary configurations and eventually regain consistency without 
external help. However, mobile clients often have access to some reliable external information from a service 
such as GPS. Each of our algorithms in this paper uses an external GPS service (or an equivalent service) 
as a reliable GPS oracle, providing periodic real time clock and location updates, to base stabilization upon; 
our algorithms use timestamps and location information to tag events. In an arbitrary state, recorded 
events may have corrupted timestamps. Corrupted timestamps indicating future times can be identified 
and reset to predefined values; new events receive newer timestamps than any in the arbitrary initial state. 
This eventually allows nodes in the system to totally order events. We use the eventual total order to 
provide consistency of information and distinguish between incarnations of activity (such as retransmissions 
of messages). 

Virtual Stationary Automata programming layer. In prior work [8, 7, 6], we developed a notion of 
"virtual nodes" for mobile ad hoc networks. A virtual node is an abstract, relatively well-behaved active node 
that is implemented using less well-behaved real physical nodes. The GeoQuorums algorithm [7] proposes 
storing data at fixed locations; however it only supports atomic objects, rather than general automata. 
A more general virtual mobile automaton is suggested in [6]. Finally, the virtual automata presented in 
[8, 9] (and used here) are more powerful than those of [6], providing timing capabilities needed for many 
applications. These automata are stationary and arranged in a connected pattern similar to that of a 
traditional wired network. 

The static infrastructure we use in this paper includes fixed, timed virtual machines with an explicit notion 
of real time, called Virtual Stationary Automata (VSAs), distributed at known locations over the plane [8, 9]. 
Each VSA represents a predetermined geographic area and has broadcast capabilities similar to those of the 



mobile physical nodes, allowing nearby VSAs and mobile nodes to communicate with one another. Many 
algorithms depend significantly on timing, and it is reasonable to assume that many mobile nodes have access 
to reasonably synchronized clocks. In the VSA layer, VSAs also have access to virtual clocks, guaranteed 
to not drift too far from real time. The layer provides mobile nodes with a fixed virtual infrastructure, 
reminiscent of more traditional and better understood wired networks, with which to coordinate their actions. 

Our clock-enabled VSA layer is emulated by physical mobile nodes in the network. Each physical node is 
periodically informed its region by the GPS. A VSA for a particular region is then emulated by a subset of 
the mobile nodes in its region: the VSA state is maintained in the memory of the physical nodes emulating 
it, and the physical nodes perform VSA actions on behalf of the VSA. If no physical nodes are in the region, 
the VSA fails; if physical nodes later arrive, the VSA restarts. 

An important property of the VSA layer implementation described in [8, 9] is that it is self-stabilizing. 
Corruption failures at physical nodes can result in inconsistency in the emulation of a VSA. Our implemen- 
tation, however, can recover after corruptions to correctly emulate a VSA. To algorithms run on the VSA 
layer, the VSA simply appears to suffer from a corruption. 

Geographic/ VSA-to-VSA routing. A basic service running on the VSA layer that we describe and 
use repeatedly is that of VSA-to-VSA (region-to-rcgion) routing (VtoVComm), providing a form of geocast. 
GeoCast algorithms [24, 3], GOAFR [19], and algorithms for "routing on a curve" [23] route messages 
based on the location of the source and destination, using geography to delivery messages efficiently. GPSR 
[17], AFR [20], GOAFR+ [19], polygonal broadcast [10], and the asymptotically optimal algorithm [20] 
are algorithms based on greedy geographic routing algorithms, forwarding messages to the neighbor that is 
geographically closest to the destination. The algorithms also address "local minimum situations" , where the 
greedy decision cannot be made. GPSR, GOAFR+, and AFR achieve, under reasonable network behavior, a 
linear order expected cost in the distance between the sender and the receiver. We implement VSA-to-VSA 
routing using a persistent greedy depth-first search (DFS) routing algorithm that runs on top of the VSA 
layer's fixed infrastructure. Our scheme is an application of the classical DFS algorithm in a new setting. 
Location management. Finding the location of a moving client in an ad-hoc network is difficult, much 
more so than in cellular mobile networks where a fixed infrastructure of wired support stations exist (as in 
[16]), or in sensor networks where some approximation of a fixed infrastructure may exist [2]. A location 
service in ad-hoc networks is a service that allows any client to discover the location of any other client 
using only its identifier. The basic paradigm for location services that we use here is that of a home location 
service: Hosts called home location servers are responsible for storing and maintaining the location of other 
hosts in the network [1, 14, 21]. Several ways to determine the sets of home location servers, both in the 
cellular and entirely ad-hoc settings, have been suggested. 

The locality aware location service (LLS) in [1] for ad-hoc networks is based on a hierarchy of lattice 
points for destination nodes, published with locations of associated nodes. Lattice points can be queried 
for the desired location, with a query traversing a spiral path of lattice nodes increasingly distant from the 
source until it reaches the destination. Another way of choosing location servers is based on quorums. A set 
of hosts is chosen to be a write quorum for a mobile client and is updated with the client's location. Another 
set is chosen to be a read quorum and queried for the desired client location. Each write and read quorum 
has a nonempty intersection, guaranteeing that if a read quorum is queried, the results will include the latest 
location of the client written to a write quorum. In [14], a uniform quorum system is suggested, based on a 
virtual backbone of quorum representatives. Geographic quorums based on the focal points abstraction are 
suggested in [7]. 

Location servers can also be chosen using a hash table. Some papers [21, 15, 25] use geographic locations 
as a repository for data. These use a hash to associate each piece of data with a region of the network and 
store the data at certain nodes in the region. This data can then be used for routing or other applications. 
The Grid location service (GLS) [21] maps client ids to geographic coordinates. A client C p 's location is 
saved by clients closest to the coordinates p hashes to. 

The location managment scheme we present here is based on the hash table concept and built on top of 
the VSA layer and VSA-to-VSA routing service. VSAs and mobile clients are programmed to form a self- 
stabilizing, fault-tolerant distributed data structure for location management, where VSAs serve as home 
locations for mobile clients. Each client's id hashes to a VSA region, the client's home location, whose VSA 
is responsible for maintaining the location of the client. Whenever a client node C' p would like to locate 



System constants: 


e sample> the GPS sample period. 




R, a fixed closed connected region of the 2-D plane. 


d, the broadcast message delay. 




U, a finite set of ids for subregions of R. 


e, the delay factor for VSA outputs. 




m, the size of U. 


ttlytoV > d, the VtoVComm message delay. 




region, a mapping from U to connected subsets of R. 


tysAcor, the VSA stabilization time. 




nbrs, a symmetric relation between ids in U. 






r v i r t, the suprcmum distance between points in u 


System variables: 




and v for any regions u,v where u £ nbrs(v). 


now S R, a clock variable, representing real time 




P, a finite set of client node ids where P D U = 0. 


/oc, a continuously updated array of locations in 


K 


Vmax, the maximum client node speed. 


of mobile nodes, indexed by node id. 




Figure 1: System constants and variables. 



another client node C q , C v would compute the home location of C q by applying a predefined global hash 
function to C g 's id, and query the region represented by the result of that hash for C g 's location. In order 
for our scheme to tolerate crash failures of a limited number of VSAs, each mobile client id actually maps 
to a set of VSA home locations; the hash function returns a sequence of region ids as the home locations. 
We can use any hash function that provides a sequence of region identifiers; one possibility is a permutation 
hash function, where permutations of region ids are lexicographically ordered and indexed by client id. 

End-to-end routing. Another basic, but difficult to provide, service in mobile networks is end-to-end 
routing. Our self-stabilizing implementation of a mobile client end-to-end communication service is simple, 
given VSA-to-VSA routing and the home location service. A client sends a message to another client by 
using the home location service to discover the destination client's region and then has a local VSA forward 
the message to the region using the VSA-to-VSA service. 



Paper organization. The rest of the paper is organized as follows: The system model and the virtual 
automata layer are described in the next section. In Section 3 we describe the problem specifications we 
are interested in. Section 4 describes the VSA-to-VSA communication implementation. In Section 5 we 
descibe the implementation of the home location service. In Section 6 we present the implementation of the 
end-to-end routing service. Concluding remarks appear in Section 7. 

2 Datatypes and system model 

The system consists of a 2-D bounded region plane, where broadcast-enabled, GPS-updated mobile client 
nodes are deployed. We assume the Virtual Stationary Automata programming abstraction [8], which in- 
cludes both the mobile client nodes and virtual stationary automata (VSAs) the real nodes emulate, as well 
as a local broadcast service, V-bcast, between them (see Figure 2). In this section we formally describe the 
system, including: (1) the network tiling, (2) the model for the GPS-augmented mobile clients deployed in 
the network, (3) the model for the virtual nodes deployed in the network, and (4) the specification for the 
local broadcast service in the network. A summary table of datatypes, constants, and variables is in Figure 
1. 

2.1 Network tiling 

The deployment space of the network is assumed to be a fixed, closed, and bounded connected region of 
the 2-D plane called R. R is partitioned into known connected subregions called regions, with unique ids 
drawn from the set of region identifiers U. In practice it may be convenient to restrict regions to be regular 
polygons such as squares or hexagons. We define a neighbor relation nbrs on ids from U. This relation holds 
for any two region identifiers u and v where the supremum distance between points in u and v is bounded 
by a constant r V i rt . 

2.2 Client nodes 

For each p in the set of physical node identifiers P, we assume a mobile timed I/O automaton client C p , 
whose location in R at any time is referred to as loc{p). Mobile client speed is bounded by a constant v max . 
Clients receive region and time information from the GPS oracle. A GPSupdate(u,now) p happens every 
^sample time at each client C' p , indicating to the client the region u where it is currently located and the 



current time now. Clients accept this now real-time clock variable as the value of their own local clock. For 
simplicity, this local variable progresses at the rate of real time. This implies that, outside of failures, the 
local value of now will equal real time. 

Each client C p is equipped with a local broadcast service V-bcast (see Section 2.4), allowing it to com- 
municate with its and neighboring regions' VSAs and clients with bcast(m) p and brcv(m) p . 

Clients are susceptible to stopping and corruption failures. After a stopping failure, a client performs no 
additional local steps until restarted. If restarted, it starts again from an initial state. If a node suffers from 
a corruption, it experiences a nondeterministic change to its program state. 

Additional arbitrary external interface actions and local state used by algorithms running at the client 
are allowed. For simplicity local steps are assumed to take no time. 

2.3 Virtual Stationary Automata (VSAs) 

Here we describe VSAs; a self-stabilizing implementation of such machines using a GPS oracle and the 
physical mobile nodes in the system can be found in [8, 9]. An abstract VSA is a timing-enabled virtual 
machine that may be emulated by the physical mobile nodes in its region in the network. We formally 
describe a timed machine for region u, V u , as a TIOA whose program is a tuple of its action signature, sig u , 
valid states, states u , a start state function mapping clock values to start states, start u , a discrete transition 
function, S u , and a set of valid trajectories, t u . Trajectories [18] describe state evolution over intervals of 
time. The state of V u is referred to collectively as vstate and is assumed to include a variable corresponding 
to real time, vstate.now. 

To guarantee that we can emulate a VSA using physical mobile nodes, its interface must be emulatable 
by the nodes. Hence, a VSA V u 's external interface is restricted to be similar, including only stopping failure, 
corruption, and restart inputs, and the ability to broadcast and receive messages. Corruption failures result 
in a nondeterministic change to vstate. 

Since a VSA is emulated by physical nodes (cor- 
responding to clients) in its region, its failures are 
defined in terms of client failures in its region: (1) If 
no clients are in the region, the VSA is crashed, (2) If 
no failures of clients (corruption or stopping) occurs 
in an alive VSA's region over some interval, the VSA 
does not suffer a failure during that interval, and (3) 
A VSA may suffer a corruption only if a mobile client 
in its region suffers a corruption; the self-stabilizing 
implementation of a VSA in [8, 9] guarantees that 
within tysAcor of an arbitrary configuration of the 
emulation, the emulation's external trace will look 
like that of the abstract VSA, starting from a cor- 
rupted abstract state. 

While an emulation of V u would ideally be iden- 
tical to a legitimate execution of V u , an abstraction 
must reflect that, due to message delays or node fail- 
ure, the emulation might be behind real time, ap- 
pearing to be delayed in performing outputs by up 
to some time e. The emulation is then a delay- 
augmented TIOA, an augmentation of V u with tim- 
ing perturbations, represented with buffers Dout[e] u , 
composed with V u 's outputs. The buffer delays mes- 
sages by a nondeterministic time [0,e], where e is 
more than V-bcast 's broadcast delay, d (see Section 2.4). Programs must take into account e, as they do d. 

2.4 Local broadcast service (V-bcast) 

Communication is in the form of local broadcast service V-bcast, with broadcast radius r V i rt and message 
delay d. It allows communication between VSAs and clients in the same or neighboring regions. The service 
allows the broadcasting and receiving of message m at each port i e PUU through bcast(m)i and brcv(m)i. 





Figure 2: Virtual Stationary Automata layer. VSAs 
and clients communicate locally using V-bcast. VSA 
outputs may be delayed in Dout. 



We assume that V-bcast guarantees two properties between VSAs and between VSAs and clients: in- 
tegrity and reliable local delivery. Integrity guarantees that for any brcv(m)i that occurs, a bcast(m)j,j € 
PUU previously occurred. Reliable local delivery roughly guarantees that a transmission will be received by 
nearby ports: If port i, where i is a client or VSA port in any region u, transmits a message, then every port 
j, whether a client or VSA port, in region u or neighboring regions during the entire time interval starting 
at transmission and ending d later receives the message by the end of the interval. (For this definition, due 
to GPSupdate lag, a client is still said to be "in" region u even if it has just left region u but has not yet 
received a GPSupdate with the change.) 

In practice, a broadcast service has bounded buffers. Wc assume buffers arc large enough that overflows 
do not occur in normal operation. In the event of overflow, overflow messages are lost. 

3 Problem specifications 

We describe the services we will build over the VSA layer: VSA-to-VSA routing, a location service, and 
client-to-client routing, and describe our requirement that implementations be self-stabilizing. 

The following constants (explained/used shortly) are globally known: (1) / < m, a limit on "home 
location" VSA failures for a client, (2) h, a function mapping each client id to a sequence of / + 1 distinct 
region ids, (3) ttlytoV > d, delivery time for the VtoVComm service, (4) UIhls > £ S ampie + 2d+3e+2ttlvtoV , 
response time of the location management service, and (5) ttlhb, a refresh period. We assume the following 
client mobility and VSA crash failure conditions: 

(1) Each client spends at least e sam pie time in a region before moving to another region, 

(2) At any time, each alive client's current region or a neighboring region has a non-crashed VSA that 
remains alive for an additional UIhls time, 

(3) For any interval of length UlytoV + e , t wo VSAs alive over the interval are connected via at least one 
path of non-crashed VSAs over the entire interval, and 

(4) For any interval of length ttlhb + 2ttlvtoV + 2e + d, and any alive client q, at least one VSA from h(q) 
does not crash during the interval. 

3.1 VSA-to-VSA communication service (VtoVComm) specification 

The first service is an intcr-VSA routing service, where a VSA from some region u can send a message m 
through VtoVsend(w, m) u to a VSA in another (potentially non-neighboring) region v. Region v's VSA later 
receives m through VtoVrcv(m) t; . The service guarantees two properties: 

(1) If a VSA at region u performs a VtoVsend(w, m), and both region u and v VSAs are alive over the 
time interval beginning with the send and ending ttlvtoV time later, then the VSA at region v performs a 
VtoVrcv(m) before the end of the interval, and 

(2) If a message is received at some VSA, it was previously sent to that VSA. 

3.2 Location service specification 

A location service answers queries from clients for the locations of other clients. A client node p can submit 
a query for a recent region of client node q via a \-\Lquery(q) p action. If few home location failures occur and 
q has been in the system for a sufficient amount of time, the service responds within bounded time with a 
recent region location of q, qreg, through a HLreply(g, qreg)i action. 

To be more exact, the location service guarantees that if a client p performs a HLquery to find an alive 
client q that has been in the system longer than e sam pie + d + UlytoV + e + UIhls time, and client p does 
not crash or move to a different region for UIhls time, then: 

(1) Within UIhls time, client p will perform a HLreply with a region for q, and 

(2) If p performs a HLreply(g, qreg), then p had requested g's location and q was either: (a) alive in region 
qreg within the last UIhls time, or (b) failed for at most ttlhb + UIhls ~ ^sample time. 

3.3 Client end-to-end routing (EtoEComm) specification 

End-to-end routing is an important application for ad-hoc networks. The V-bcast service provides a local 
broadcast service where VSAs and clients can communicate with VSAs and clients in neighboring regions. 
VtoVComm allows arbitrary VSAs to communicate. End-to-end routing (EtoEComm) allows arbitrary 



clients to communicate: a client p sends message m to client q using send(q,m) p , which is received by q in 
bounded time via receive(m) 9 . 

If clients p and q do not crash for Mhls time, clients do not change regions for Mhls time after a send, 
and q has been in the system at least Mhls + ^sample + d + ttlytoV + e time, then: 

(1) If client p sends message m to q, q will receive m within Mhls + 2d + 2e + ttlytoV time, and 

(2) Any message received by a client was previously sent to the client. 

3.4 Self- stabilizing implementations 

We require implementations of the above services to be self-stabilizing. A system configuration is safe with 
respect to a specification and implementation if any admissible execution fragment of the implementation 
starting from the configuration is an admissible execution fragment of the specification. An implemen- 
tation is self-stabilizing if starting from any configuration, an admissible execution of the implementation 
eventually reaches a safe configuration. Notice that in the presence of corruptions, if an implementation is 
self-stabilizing, then any long enough execution fragment of the implementation will eventually have a suffix 
that looks like the suffix of some correct execution of the specification, until a corruption occurs. 

Each of the above services' self-stabilizing implementations will be built on top of self-stabilizing im- 
plementations of other services: VtoVComm over the VSA layer, the location service over the VSA layer 
and VtoVComm service, and EtoEComm over the VSA layer, VtoVComm, and location services. Each self- 
stabilizing implementation uses lower level services without feedback, so lower level service executions are not 
influenced by the upper level services. This allows us to guarantee that higher level service implementations 
are still self-stabilizing through fair composition [11]. 

Our service implementations, starting from an arbitrary system configuration, stabilize within the fol- 
lowing times: VtoVComm: ttlytoV + d time after the VSA layer stabilizes (tysAcor time), the loca- 
tion service: max(MHLS,2e + MtlvtoV + ttlhb + 2d) time after VtoVComm stabilizes, and EtoEComm: 
ttlpb + 2d + 2e + ttlytoV time after the location service has stabilized. 

4 VSA to VSA communication (VtoVComm) implementation 

The VtoVComm service allows communication of 
messages between any two VSAs through VtoVsend 
and VtoVrcv actions, as long as there is a path of 
non-failed VSAs between them. The VtoVComm 
service is built on top of the V-bcast service [8], 
which supports communication between two neigh- 
boring VSAs (see Figure 3). 

VSA-to-VSA communication is based on a 
greedy DFS procedure. When a VSA receives 
a message for which it is not the destination, it 
chooses a neighboring VSA that is on a shortest 
path to the destination VSA and forwards the mes- 
sage in a forward message to that neighbor. If the 
VSA docs not receive an indication through a found 
message that the message has been delivered to the 
destination within some bounded amount of time, it 
then forwards the message to the neighboring VSA 
on the next shortest path to the destination VSA, 
and so on. This choice of neighbors is greedy in the 
sense that the next neighbor chosen to receive the 
forwarded message is the one on a shortest path to 
the destination VSA, excluding the neighbors as- 
sociated with previous tries. The greedy DFS can 
turn into a flood in pathological situations in which 
the destination is that last VSA reached. 
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Figure 3: VSA-to-VSA communication (VtoVComm). 
A VSA at region u sends a message m to region v's 
VSA with a VtoVsend(v,m) M . The message is eventu- 
ally received at region v by VtoVrcv(m)„. 





Signature: 


Internal DFStimeout(msg) u 


2 


Input VtoVsend (d, m) u , d &U,m arbitrary 


Precondition: 




Input brcv(m)„, m £ ({forwardjx Msgx Ux {u}) 


DFStable(msg). nbrTO < now 


4 


U ({found} X Msg) 


V DFStable(msg). nbrTO > now + 8(u, msg.v2vd) 




Output bcast(m) u ,m arbitrary 


Effect: 


6 


Output VtoVrcv(m) u ,m arbitrary 


if DFStable(msg). NbrSet ^ then 




Internal DFStimeout(msg) u , msg £ Msg 


curNbr <- NxtNbr (DFStable(msg). NbrSet, 


8 


Internal DFSclean(msg) u , msg £ Msg 


DFStable(msg).isrc, u, msg.v2vd) 




Msg = Mx Ux Ux E, of the form (in, v2vs, v2vd, ts) 


DFStable(msg). NbrSet *- DFStable(msg). NbrSet \{curNbi} 


10 




bcastg <— bcastq U {(forward, msg, «, curNbr)} 




State: 


DFStable(msg). nbrTO <— noiu +<5(u, msg.v2vd) 


12 


analog now £ E, the current real time 

bcastq, VtoVrcvq, queues of messages, initially 


else DFStable(msg) <— null 


14 


DFStable, a table indexed on message tuples in 


Input brcv( (forward, msg, isrc, u)) u 




Msg with entries in (nbrs(u) X 2 ni,r3 (") x E), 


Effect: 


16 


of the form (isrc, NbrSet, nbrTO) , initially 


if rasg.ts £ [now -ttlvtoV, now] then 




curNbr £ U, initially _L 


if « = msg.v2vd then 


18 




bcastq <— bcastq U {(found, msg)} 




Trajectories: 


Vto Vrcvq *— Vto Vrcvq U { msg. m} 


20 


satisfies 


else if DFStable(msg) = null then 




d(now) = 1 


DFStable(rnsg) <— {isrc, nbrs(u)\{isrc},now) 


22 


constant bcastq, VtoVrcvq, DFStable, curNbr 






stops when 


Input brcv((found, msg)) u 


24 


Any precondition is satisfied. 


Effect: 

if DFStable(msg) ^ null then 


26 


Actions: 


DFStable(msg) *- null 




Output bcast(m) u 


if u ^ msg.v2vs then 


28 


Precondition: 

m £ bcastq 


bcastq*— bcastq U {(found, msg)} 


30 


Effect: 


Output VtoVrcv(m) u 




bcastq <— bcastq \ {m} 


Precondition: 


32 




m £ VtoVrcvq 




Input VtoVsend(d, m) u 


Effect: 


34 


Effect: 

if u = d then 


Vto Vrcvq <— Vto VRcvq \ { m} 


36 


Vto Vrcvq <— Vto Vrcvq U { m} 


Internal DFSclean(msg) u 




else DFStable((m,u,d,now)) <— (u,nbrs(u),now) 


Precondition: 

DFStable(msg) ^ null Amsg.ts tf [now -ttly to y, now] 
Effect: 

DFStable(msg) <- null 




Figure 4: Greedy DFS al 


gorithm at V^f toV for region u. 



Self-stabilization of the algorithm is ensured by the use of a real-time timestamp to identify the version 
of the DFS. Too old versions are eliminated from the system and new versions are handled as completely 
new attempts to complete a greedy DFS towards the destination. 

We first present a simple greedy DFS algorithm that gradually expands the search until all paths are 
checked. This algorithm will find a path to the destination if such a path exists throughout the DFS 
execution. We also present a modification of the algorithm to produce a persistent version of the greedy 
DFS algorithm in which each VSA repeatedly tries to forward messages along previously unsuccessful VSA 
paths to take advantage of (possibly temporary) recoveries of VSAs that may result in a viable path [13]. 
Again, the persistent greedy DFS can turn into a persistent flood in pathological situations in which the 
destination is the last VSA reached. 

4.1 Detailed code description 

The following code description refers to the code for VSA V^ toV in Figure 4. The main state variable 
DFStable keeps track of information for messages that are still waiting to be delivered. For each such 
unique message, the table stores the intermediate source isrc of the message, the set of VSA neighbors 
NbrSet of neighbors that have yet to have the message forwarded to them, and a timeout nbrTO for the 
neighbor currently being tried for forwarding the message. 

A source VSA V^ toV sends a message to to a destination VSA in region d using VtoVsend(<i, m) u (line 



Internal DFStimeout(msg) u 
Precondition: 

DFStable(msg).nbrTO < now V DFStable(msg).nbrTO > now + 8(u, msg.v2vd) 
Effect: 

if DFStable(msg).NbrSet £ then 

curNbr <— NxtNbr (DFStable(msg).NbrSet, DFStable(msg).isrc, u, msg.v2vd) 
DFStable(msg).NbrSet <- DFStable(msg).NbrSet\ {curNbr} 
for each n £ nbr(u) \ DFStable(msg).NbrSet 

bcastq <— bcastq U {(forward, msg, «, n}} 
DFStable(msg). nbrTO <— now +c5(si, msg.v2vd) 
else DFStable(msg) <— null 



Figure 5: The Persistent Greedy DFS algorithm at V^* ^ for region u is the same as the Greedy DFS 
algorithm, except that the broadcast of a DFS message to curNbr in the DFStimeout action is replaced 
with a broadcast to curNbr and all previously attempted neighbors. 



33). If u = d then V^ toV received m through VtoVrcv(m) M (lines 35-36). Otherwise the destination VSA is 
another VSA and V^ toV sets the DF Stable mapping of an augmented version of the message, (to, u, d, now), 
to (u, nbrs(u) , now) . This enables the start of a new DFS execution to forward the message to its destination 
(line 37). 

Whenever the nbrTO of a message in DF Stable times out, it triggers the forwarding of the message to the 
next neighbor in the DFS, if possible. If the message hasn't yet been forwarded to all of the relevant neighbors 
(DF Stable(msg) .NbrSet is not empty), then the next neighbor closest to the destination VSA that has not 
yet had a message forwarded to it, curNbr, is selected and the message tuple msg is then forwarded in a 
forward message to it using the V-bcast service (lines 45-48). The timeout variable DF Stable(msg) . nbrTO 
for this attempt at forwarding is set to now + S(curNbr,msg.v2vd) (line 49). If the message has already 
been forwarded to all the relevant neighbors, then D F Stable(msg) is set to null, indicating that nothing 
more can be done. 

If a message tuple msg whose destination is V^ /toV is received in a forward message from isrc, then 



in. 



VSA V^ broadcasts a (found, msg) message via the V-bcast service and VtoVrcv's the message msg 
The found message notifies neighbors still participating in the DFS for msg that it has reached its final 
destination VSA. No forwarding is required (lines 55-57). Otherwise, if msg is not destined for V^ toV and 
yVtoV Joes no ^ already have an entry in DFStable for msg, then the message must be forwarded to its 
destination. DFStable(msg) is set to (isrc,nbrs(u)\{isrc},now) (line 59), storing the intermediate source, 
initializing the set of neighbors that have yet to have the message forwarded to them, and setting nbrTO to 
now. Setting nbrTO to now immediately enables the DFStimeout action for msg, triggering the forwarding 
of msg to one of V^ toV 's neighbors. 

When a found message is received for a message tuple msg that is mapped by DFStable, the entry in 
DFStable is erased, preventing additional forwarding (line 64). If u ^ msg.v2vs then VSA V^ toV broadcasts 
a found message via the V-bcast service (lines 65-66), notifying neighbors that are still participating for msg 
that it has been delivered. Clearly, if u = msg.v2vs, then no found message is required and no further action 
needs to be taken. 

4.2 Correctness 

We now prove the correctness of the algorithm. Let the source VSA be V^ toV , the destination VSA be 
Vj toV , the message sent be to, and a DFS execution exe from V^ toV to V/ toV be as defined above. We 
assume a given function S : {[/} x {[/} — ► J\f, where 5(x,y) is a bound on the time required for a message 
to arrive from x to y. This bound is based both on the distance between x and y, and the quality of the 
communication links in the network. Since the DFS and the 8 function are just employed to cut down on 
unneeded retransmission of messages, any non-negative wait time is sufficient for correctness. However, a 
wait time dependent on hop count between regions will be the most message-efficient. We argue that if no 
corruption failures occur and the status (failed or non-failed) of every VSA in hi doesn't change during exe, 
then the following holds: 

Lemma 4.1 If Vg is a non-failed VSA that performs a VtoVsend(ri, m) at time t, and there exists a 

path of non-failed VSAs between V^ toV and V/ toV from time t to time t + ttlytoV, then V/ toV performs a 



VtoVrcv(m) in the interval [t, t + ttlvtov], for ttlvtoV > [e + d + (max u vt zu8(u, v) ■ max u ^u\nbr s{u)\ — 1)] • 

(\u\-i). 

Proof sketch: The proof is by induction on the distance n between s and V/ toV on the shortest non-deserted 
path, where the distance is the number of VSAs along the path, including V/ toV . In the case n = 0, the 
message m is destined for the same VSA. According to line 35, the message is VtoVrcv'ed at the VSA. 

Let's assume that the lemma holds for every n' < n. 

Let n be the VSA-distance between V^ toV and Vj toV ■ There exists a path of non-failed VSAs between 
V s vtoV and V/ toV . Therefore, there exists a VSA V^ toV , which is a neighbor of V s vtoV , such that there 
exists a path of non-failed VSAs between V^ toV and V/ toV . The distance between V^ toV and V/ toV is 
n — 1, hence the induction assumption holds for V^ toV and Vj toV . Therefore, a message sent from V^ toV 
to V7'° y eventually reaches Vj toV ■ The same assumption holds for Vp oV and V^ toV , therefore, Vj toV 
receives the message m sent from region s. ■ 

Lemma 4.2 The number of times that a message tuple msg is re-broadcast is bounded. 
Proof sketch: The broadcast of a message tuple stops in cither of the following cases: 

• A found message was received for msg. According to line 62, if the value of DFStable{msg) was not 
already null, it gets set to null, preventing V^ toV from doing anything with subsequent found messages. 
If V^ /toV was not the original source of msg, it retransmits found for msg exactly one time. If a found 
for msg is received again, it will be ignored. A forward message for msg would need to be received 
again in order to result in any additional found mcsages for msg at this VSA. This, however, cannot 
happen since each VSA participating in the DFS waits before triggering new forward messages until 
found messages would have been returned. 

• For each VSA neighbor, if VSA V^ toV does not receive a found message for msg it will time out via 
nbrTO. Once the set of neighbors to be queried is exhausted, the VSA erases the entry for msg in 
DFStable, preventing any additional forwarding by itself. 



Lemma 4.3 Once corruptions stop and the VSA layer has stabilized, it takes up to d + ttlytoV time for 
Vto VComm to stabilize. 

Proof sketch: Any message in the system that is being forwarded by VtoVComm will be cleaned out of 
the system if they arc older than ttlytoV or newer than the current time. As a result, the longest a "bad" 
message can be in the system is this time, plus up to an additional d time where it could have been in 
transmission before being received by a VSA. ■ 

5 Home Location Service (HLS) implementation 

The location service, as described in the last section, allows a client to determine a recent region of another 
alive client. In our implementation, called the Home Location Service (HLS), we accomplish this using home 
locations. Recall that the home locations of a client node p are / + 1 regions whose VSAs are occasionally 
updated with p's region. The home locations are calculated with a hash function h, mapping a client's id to 
a list of VSA regions, and is known to all VSAs. These home location VSAs can then be queried by other 
VSAs to determine a recent region of p. 

Figure 6 depicts how the VSA abstraction and VtoVComm are used in HLS. The HLS implementation 
consists of two parts: a client-side portion and a VSA-side portion. C^ L is a subautomaton of client p 
that interacts with VSAs to provide HLS. It is responsible for notifying VSAs in its current and neighboring 
regions which region it is in. Also, C^ L handles each request submitted by input HLquery(q) p for q's region, 
by broadcasting the query via V-bcast to VSAs V^ L in its current and neighboring regions. It translates 
responses from the VSAs into HLreply outputs. 
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-fDout[e]7 



VtoVsend((>. i"),, 



VtoVrcv(;,i)„ 



VtoVsend(i), m)„ 



Figure 6: Home Location Service. A client p can query local VSAs for client q's region, 
query home locations of q, using VtoVComm, for a recent region of q, and return it to p. 



The VSAs then 



For the VSA-side, V^ L and V^ L in Figure 6 are home location VSAs corresponding to regions u and v 
of the network; they are subautomata of VSAs V u and V v . V^ L takes a request from a local client for client 
node q's region, calculates g's home locations using the hash function, and then sends location queries to the 
home locations using VtoVComm. Those virtual automata respond with the region information they have 
for q, which is then provided by V^ L to the requesting client. V^ L also is responsible both for informing 
the home locations of each client p located in its region or neighboring regions of p's region, and maintaining 
and answering queries for the regions of clients for which it is a home location. 

Time and region information from the GPS oracle is used throughout the HLS algorithm, by clients and 
VSAs, to timestamp and label information and messages. This information is used to guarantee timeliness 
of replies from the HLS service, and to stabilize the service after faults. Timestamps are used to determine 
if information is too old or too new, while region information allows clients and VSAs to know which other 
clients and VSAs to interact with. 

5.1 HLS client actions 

The code executed by client p's C^ L is in Figure 7. 

Clients receive GPSupdates every e sam pie time from the GPS automaton (lines 28-33), making them aware 
of their current region and the time. If a client's region has changed, the client immediately sends a heartbeat 
message with its id, current time and region information. The client periodically reminds its current and 
neighboring region VSAs of its region by broadcasting additional heartbeat messages every ttltb time, where 
ttlhb is a known constant (lines 35-39). 

Cp L also handles the HLquery(g) inputs it receives (line 41). This request for g's location is stored in 
a queryq table and, once the client knows its own region, translated into a (clocQuery, q) message that is 
broadcast, together with the VSA region, to local regions' VSAs (lines 45-49). If C^ L eventually receives a 
(clocReply, q, qreg) message from its current or neighboring region's VSA for a client q in queryq, indicating 
that node q was in region qreg (lines 51-55), it clears the entry for q in queryq, and outputs a HLreply(<7, qreg) 
of the information (lines 57-61). If the request for g's location goes unanswered for more than ttluLS — t sample 
time, then the request has failed and is removed (lines 63-67). 

5.2 HLS VSA actions 

The code for automaton V^ L appears in Figure 8. 

First, the VSA knows which clients are in its or neighboring regions through heartbeat messages. If a 
VSA hears a heartbeat message from a client p claiming to be in its region or a neighboring region, the 
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Constants: 


Output bcast((heartbeat, now, p) , reg) p 


2 


ttlhb 


Precondition: 




ttlHLS 


hbTO < now Areg =£ ± 


4 




Effect: 




Signature: 


hbTO <— notu + *%(, 


6 


Input GPSupdateO, t) p , v £ U, t 6 R 






Input Hl_query((/)p, g £ P 


Input Hl_query(q) p 


8 


Input brcv((m, «}) p , m £ ({clocReply} X P X (7 X U), u e U 


Effect: 




Output bcast((m,re<7})p,m£(heartbeat,now,j)}U{clocQuery}x P 


query q(q) *— oo 


10 


Output b\Lrep\y(q,v) p , q £ P, w £ [/ 






Internal queryfail(q) p , q a P 


Output bcast(((clocQuery, q) , reg)) p 


12 




Precondition: 




State: 


reg ^ _1_A query q(q) > now + ttl HLS -e aample 


14 


analog now £ K, current real time, initially _L 


Effect: 




hbTO < now + ttlhb, £ K, the next heartbeat time 


queryq(q) <- now + i« ffLS -e samp le 


16 


reg £ C7, the current region, initially _L 






query q, a table from P to R, initially 


Input brcv(((clocReply, q,qreg),u)) p 


18 


queryrcv, a queue of P X J7 pairs, initially 


Effect: 

if («£ nbrs(reg)L) {reg}A queryq(q)y^null) then 


20 


Trajectories: 


queryrcv <— queryrcv U {{q, qreg)} 




satisfies 


query q(q) <— n«H 


22 


d(now) = 1 






constant hbTO, reg, query q, queryrcv 


Output HLreply((;r, qreg) p 


24 


stops when 


Precondition: 




Any precondition is satisfied. 


(q, qreg) £ queryrcv 


26 




Effect: 




Actions: 


queryrcv <— queryrcv \ {{q, qreg) } 


28 


Input GPSupdate(u, t) p 






Effect: 


Internal queryfail(g) p 


30 


now <— t 


Precondition: 




if reg ^ v then 


query q(q) < now 


32 


reg <— w 


Effect: 




hbTO <- now 


query q(q) <— n«M 




Figure 7: HLS's C^ L automaton. This client subautoma 


ton serves as a bridge between the client's 




requests and the VSA layer. 





VSA sends a locUpdate message for p, with p's heartbeat timestamp and region, through VtoVComm to the 
VSAs at home locations of client p (lines 42-46), where home locations are computed using the known hash 
function h from P x {1, • • • , / + 1} to U. 

When a VSA receives one of these locUpdate messages for a client p, it stores both the region indicated 
in the message as p's current region and the attached heartbeat timestamp in its loc table (lines 48-51). 
This location information for p is refreshed each time the VSA receives a locUpdate for client p with a newer 
heartbeat timestamp. Since a client sends a heartbeat message every ttlhb time, which can take up to d + e 
time to arrive at and trigger a VSA to send a locUpdate message through VtoVComm, which can take 
ttlytoV time to be delivered at a home location, an entry for client p is erased if its timestamp is older than 
ttlhb + d + e + ttlvtov (lines 53-57). 

The other responsibility of the VSA is to receive and respond to local client requests for location infor- 
mation on other clients. A client p in a VSA's region or a neighboring region v can send a query for g's 
current location to the VSA. This is done via a mobile node's broadcast of a ((clocQuery, q),v) message. 
When the VSA at region u receives this query, if no outstanding query for q exists, it notes the request for q 
in Iquery(q), and sends a vIocQuery message to q's f + 1 home locations, querying about g's location (lines 
59-65). Any home location that receives such a message and has an entry for g's region responds with a 
vIocReply to the querying VSA with the region (lines 67-70). 

If the querying VSA at u receives a vIocReply in response to an outstanding location request for a client 
q, it stores the attached region information in Iquery(q) (lines 72-75), broadcasts a clocReply message with 
q and its region to local clients, and erases the entry for Iquery(q) (lines 77-81). If, however, 2ttlvtoV + 2e 
time passes since a request for q's region was received by a local client and there is no entry for q's region, 
Iquery(q) is just erased (lines 83-87). 
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Constants: 


Input VtoVrcv((i>, (locUpdate, q,t))) u 


2 


ttlvtoV 


Effect: 




ttlhb 


if loc(q).ts < t < now then 


4 


h, a hash function from P X {1, •••,/ + 1} to U 
such that for p e P, x, j/ e {1, • • • , / + 1}, 


loc(q) <— (u, t) 


6 


if ie 7^ y, then /i(p, x) ^ h(p, y) 


Internal cleanLoc(g) u 
Precondition: 


8 


Signature: 


loc(q).ts £ [now -ttl^i, -d -e -ttlytoVy now] 




Input brcv((m, v)) u , m £ ({heartbeat} X R X P) 


Effect: 


10 


U ({clocQuery} X P), i; £ [/ 
Input VtoVrcv((ii, m)) u ,v £ [7, m £ ({locUpdate} X Px 


loc(q) <— tmiZZ 


12 


K)U ({vIocQuery} x P)U ({vIocReply} x Px C/) 


Input brcv(((clocQuery, q),v)) u 




Output bcast(((clocReply, q, qreg), u)) u ,qd P, qregd U 


Effect: 


14 


Output VtoVsend(t), m) u ,v £ (7 


if ( [Zguen^ g) = null V Iquery(q) .to < now] 




Internal updateHL(g) u , q £ P 


A t) £ nfers(«)U {«}) then 


16 


Internal cleanl_oc(o:) u ,g £ P 


Iquery(q) *— (now + 2tr7y to v + 2e, _l_) 




Internal cleanLquery(g) u ,<jr £ P 


for j = 1 to )+l 


IS 


State: 


vtovq <— vtovq U {(/i(g,i), («, (vIocQuery, g}}}} 


20 


Zoc, a table indexed on process ids with entries 


Input VtoVrcv((i), (vIocQuery, q)))u 




from U X M.— , of the form (reg,ts) 


Effect: 


22 


Iquery, a table indexed on process ids with entries 


if loc(q) 7^ rmH then 




from R-° X U, of the form (to, qreg) 


vtovq <— vtovq U {(?), (u, (vIocReply, g, Zoc(g).reo:}}}} 


24 


vtovq, a queue of tuples from U X msg 
(Above all initially empty) 


Input VtoVrcv((t>, (vIocReply, q, qreg))) u 


26 


analog now £ M- , the current real time 


Effect : 

if Iquery(q) ^ null then 


28 


Trajectories: 


Iquery(q) . qreg *— qreg 


30 
32 


satisfies 

d(now) = 1 

constant loc, Iquery, vtovq 
stops when 


Output bcast(((clocReply, q, Iquery (q) . qreg) , u)) u 
Precondition: 

lquery(q).qreg 7^ _L 
Effect: 

Iquery(q) *— null 


34 


Any precondition is satisfied. 


36 
38 
40 


Actions: 

Output VtoVsend(j), m) u 
Precondition: 

(v, m) £ vtovq 
Effect: 

vtovq <— vtovq \ {(v, m)} 


Internal cleanl_query(g) u 
Precondition: 

lquery(q).to [now, now + 2ttly- to y + 2e] 
Effect: 

Iquery(q) «— naM 


42 


Input brcv(((heartbeat, t, p), v)) u 
Effect: 




44 


if (v £ nbrs(u)\J {«}A now -d < t < now) then 
for i = 1 to /+1 




46 


vtovq <— vtovq U {(/i(</, i), (w, (locUpdate, g, i)}}} 






Figure 8: HLS's 1 


/jf* L automaton. 
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5.3 Correctness 

We make the system assumptions described in Section 3. Call Cq the first global configuration where the 

system is consistent. For the following two lemmas and theorem, assume we are in a configuration after Cq, 

and that no corruption failures occur. 

Lemma 5.1 For any VSA u, if there is a request for q's region in Iquery, it was submitted through a 

HLquery(g) at a client within the last e samp i e + d + 2ttlvtoV + 2e time. 

Proof sketch: Once a request is submitted by a client to C p L , if the client has not ever received a GPSupdate, 

it can take up to e samp i e time for the client to receive one. After the client has received one, it then broadcasts 

the request to local VSAs, which takes up to d time to be delivered. VSAs then hold these queries until they 

expire 2ttlvtoV + 2e later. ■ 

Lemma 5.2 Starting e sam pie + d + e + ttlvtoV time after client p enters the system and until p fails, for 
each interval of length ttlvtoV + e ; a H but f of p's home locations will have a non-null loc{p) entry for the 
entire interval. If client p is alive and there is some VSA u such that loc{p) is not null, p was alive and 
located in loc{p).reg within the last e samp i e + d + e + ttlvtoV time. 

Proof sketch: Within e sam pie time of a client entering the system, a GPSupdate occurs and the client trans- 
mits a heartbeat message. This message can take up to d time to be received by a nearby VSA, after which 
it can take e + ttlytoV time for the VSA to transmit the associated locUpdate message to the client's home 
locations and have the message be received, updating any alive home locations' loc(p) entries. Since for any 
interval of length ttlhb + d + 2e + ttlvtoV, a t most / of the client's home locations can be failed at any point 
in the interval, all but / of the client's home locations will receive a locUpdate message and have a non-null 
loc{p) entry, and will remain alive with a non-null loc{p) entry for at least ttly to v + e after the next locUpdate 
message is received (within ttlhb + d + e + ttlvtoV time after the first was sent). Since this is true for each 
locUpdate message, there can only be / home locations that either do not have a non-null loc{p) entry or 
that will not be alive for an additional ttlvtoV + e time. 

For the second statement, note that an alive client p will send a heartbeat message within e samp i e time of 
arriving in a region, prompting updates to loc(p) at alive home locations within d+e + ttlytoV time. Hence, 
if a client is alive, any non-null entry for loc{p).reg can only be as old as e samp i e + d + e + ttlvtoV- ■ 

Theorem 5.3 Every client p searching for a non-failed client q that has been in the system longer than 
ttluLS + ^sample + d + ttlvtoV + e time will perform a HLreply(g, qreg) within time UIhls, such that q was 
located in region qreg no more than ttluLS time ago. No reply will occur if q has been failed for more than 
ttlhb + ttluLS — ^sample time. Any reply is in response to a query. 

Proof sketch: For the first statement, by the previous lemma, we know that once client q has been in the 
system for e samp i e + d + e + ttlvtoV time, any queries of its home locations will succeed in producing a 
result. However, a new HLquery request "piggybacks" on any prior unexpired HLquery requests. Since one 
of these requests could have been initiated just before the client q's home locations are updated, we can only 
guarantee a response will be received for a new request if any outstanding requests will be answered. If the 
client has been in the system for this total ttluLS + d + e + ttlvtoV time after receiving its first GPSupdate, 
then any response to a query can take as much as UIjjls time: e samp i e time for the querying client to receive 
its first GPSupdate, d time for the query to be transmitted and received by a local VSA, e + MytoV for the 
local VSA to query a home location, e + ttlytoV for the response to arrive at a local VSA, e time for the local 
VSA to transmit the response to its requesting clients, and d time for the transmission to be received and 
translated into HLreplys at clients. This total is ttljjLS- As for the age of the response, by the prior lemma, 
we know that information can only be out of date by e samp i e + ttlytoV + e + d time when a home location 
responds to a query by another VSA. The response can take e + ttlvtoV time to arrive at the querying VSA, 
followed by e + d time for the querying VSA to get the information to the clients that prompted the query. 
The oldest the information could be is the total. 

For the second statement, note that a failed client will not send a heartbeat message. Since loc(p) entries 
arc cleared once ttlhb + d+ e + ttlvtoV time has passed since the heartbeat message upon which it was based 
was broadcast, and the information from the entry can only take as much as e + ttlytoV time to reach a 
querying VSA and e + d time to reach any querying clients, the total is the maximum time a HLreply can 
occur after the client fails. 

For the third statement, note that a query expires after ttluLS time. Hence, any response generated 
must be for a query that occurred no more than that time before. ■ 
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Theorem 5.4 Starting from an arbitrary configuration, after VtoVComm has stabilized, it takes max(ttlHLS, 2e- 
SttlvtoV + ttlhb + 2d) time for HLS to stabilize. 

Proof sketch: Once lower levels have stabilized, most client state is made locally consistent within e samp i e 
time, the time for the client to get a GPSupdate. This action resets most variables if the region is updated. 
The remaining portions of client state are made consistent instantaneously with local correction actions, with 
the exception of the heartbeat timer and query q variables. The heartbeat timer can only affect operations 
for at most ttlhb time. The queryq variable can only affect operations for ttlnLS time, when it would be 
deleted. 

For VSAs, there are two variables that are not instantaneously corrected: loc and Iquery. 

The loc variable will be consistent within time e+2ttly to v + ttlhb + d. At worst, there could be a corrupted 
message that arrives at a VSA after ttlytoV time, adding a bad entry that takes e + ttlytoV + ttlhb + d time to 
expire. If the client referred to is in the system, it might not be until the next update after the timestamp of 
the corrupted message (which could have been delivered as late as ttlytoV after corruptions stopped) arrives 
for the information to be cleaned up. This time is exactly what the offset term for loc timeouts describes. 
Hence, the variable might not be cleaned until ttlytoV plus that offset term. 

However, there may be responses based on this bad loc table information that were sent right at e + 
2ttlvtoV + ttlhb + d, and that take e + ttlvtoV to arrive at the VSA. The resulting transmission (taking d 
time to complete) to local clients is then incorrect. However, those incorrect transmissions cease after the 
total time 2e + 3ttlvtoV + ttlhb + 2d elapses. 

The Iquery variable is cleaned up within ttlnLS time. An entry in Iquery only has a total of 2ttlvtoV + 2e 
time in the data structure. It could be the case that a spurious request was transmitted in the beginning, 
which adds d time. If a region response is received it results in immediate correction of the state through 
erasure. Hence, the time required to be consistent is the time that it takes for a query to be accounted for. 

The maximum of Mhls and 2e + SttlytoV + ttlhb + 2d is the maximum stabilization time. ■ 

5.4 Extensions 

Here we briefly describe some possible extensions to our HLS algorithm: 

Home location voting mechanisms: In systems where corruption failures are limited in number at the 
VSA level, our implementation could be extended to use a voting mechanism, allowing the "weed-out" of 
information from corrupted home locations. Rather than querying VSAs waiting for a single region response 
from a home location VSA, they could wait until the same region is returned from a majority of home 
locations VSAs. If corruption is limited to some small number of VSAs at a time, but can happen often, 
then this voting mechanism can be used to provide a stronger location service, immune to these limited 
number of faults. 

Randomized asymmetric quorums: It is possible to have asymmetric updates and queries, such as with 
local updates to close-by VSAs and uniformly selected VSAs or vice versa (the expected number of VSAs 
that are required to be updated and queried is small, as proved in [22] ) . Instead of using a predefined set 
to query, one might use a randomized scheme based on [22] , where a random set of regions is chosen for 
updating and inquiring about the location of a client node. Moreover, we could enhance the scheme in [22] 
by using a predefined set for location updates (such as the close-by regions) and random set for location 
queries (or vice versa). 

Attribute queries: There are scenarios in which one would like to query for client nodes with certain 
attributes in a geographic area (e.g., a search for a medical doctor that is currently near by). Our scheme 
supports such queries in a natural way: Attributes can hash to home locations that store tables of clients 
with the attribute, and their locations. Clients searching for another nearby client with some attribute could 
then have a local VSA query home locations for the attribute, and select a nearby client from the list that 
is returned. 

6 Client end-to-end routing (EtoEComm) implementation 

Our implementation of the end-to-end routing service, EtoEComm, uses the location service to discover a 
recent region location of a destination client node and then uses this location in conjunction with VtoVComm 
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VtoVsend(v. m)„ 



VtoVsendf u, m}„ 



EtoEComm 



Figure 9: End-to-end routing. A client C E2E can send a message to another client C E2E by querying HLS 
for q's region, and then having local VSAs forward the message to g's local regions through VtoVComm. 
The message is received by those VSAs and broadcast for delivery by C E2E . 

to deliver messages (see Figure 9). As in the implementation of the Home Location Service, there are two 
parts to the end-to-end routing implementation: the client-side portion and the VSA-side portion. Also as in 
HLS, time and region information from the GPS oracle is used throughout this implementation to timestamp 
and label information. 

The client-side portion C E2E takes a request to send a message to another client q, queries the HLS 
for q's location, and submits the message to have it sent by a VSA in its current or neighboring regions 
to g's location. It also takes messages originating at other clients and transmitted to it by its current or 
neighboring regions' VSAs, and delivers them. 

The VSA V E2E portion is very simple. A client may send it information to be transmitted to other 
VSAs, which it forwards through VtoVComm, or another VSA may send it information to be delivered at a 
client in its own or a neighboring region, which it forwards through V-bcast. 

6.1 EtoEComm client actions 

The signature, state, and actions of C E2E are in Figure 10. The main variable phbook is a table, indexed 
on destination pid, with entries of the form (reg,ttl,msg). For a client q, phbook(q) .reg stores the current 
region of q (unless it is unknown, in which case it is _L). The field til stores a timeout for phbook{q).reg if 
the region of q is known and stores a timeout for querying for the region if not. The set msg stores messages 
being sent to q. 

The GPSupdate(v,i) action (line 36) results in an update of the client's reg variable to the region v 
indicated in the action and a reset of the local clock. 

A message m is sent to another client q via send(g, m) p . This input to C E2E results in the forwarding 
of the message to p's current region it's VSA through bcast(((sdata, m,q,phbook(q).reg),p,u)) if a region 
phbook(q).reg for q is known (line 44-45), or the saving of the message in phbook(q) .msg , if the client does 
not have the location of q (lines 46-48). 

If a recent region for q is not known, C E2E attempts to discover one. It queries HLS to determine where q 
was through the HLquery(g) p action (line 50). A timeout for response to the location request, phbook(q).ttl, 
is set for ttljjLS later. If the timeout expires but no messages are waiting to be sent, cleanPhbook(g) erases 
the entry, preventing unnecessary HLquerying (line 63). 

Once a response to an HLquery(g) is received from HLS in the form of HLreply(g, qreg) p (line 57), indicating 
q was in region qreg, entry phbook (q) .reg is updated to qreg and phbook(q) Ml is updated to now + ttl p i, 
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Constants: 


Input send(g, m) p 


2 


ttlHLS 


Effect: 




Lllpfo 


if (phbook(q).reg ^ _I_A phbook(q) .ttl > rtou>) then 


4 




sdataq <— sdataq U {(to, 9, phbook(q).reg)} 




Signature: 


else if (phbook(q)= nulls/ phbook(q) .ttl< now) then 


6 


Input Hl_reply(g, v) p ,q £ P,v £ U 


phbook(q) *— (_L, _L, {m}} 




Input send(g, m) p , q £ P 


else phbook(q).msg <— phbook(q).msg U {to} 


8 


Input GPSupdate(t>, t) p ,v £ U, t £ K 






Input brcv(((rdata, m),p,u)) p , u£ U 


Output Hl_query(<r) p 


10 


Output bcast(m) p 


Precondition: 




Output HLquery(g) p ,(jr £ P 


phbook(q) = (_L, tU, ra^l) 


12 


Output receive(m) p 


A (iiZ = _!_V tW > notu + ttl HLS ) 




Internal cleanPhbook(g) p , g 6 P 


Effect: 


14 


State: 


phbook(q) .ttl <— now + ttluLS 


16 


analog now £ K, current real time, initially _L 


Input HLreply(g, qreg) p 




reg £ [/, the current region, initially _L 


Effect: 


18 


phbook, a table indexed on process id with entries from 


for each m £ phbook(q).msg 




U X R X 2 ms s, of the form (reg, t«, msg), initially 


sdataq <— sdataq U {(to, (?, greg)} 


20 


sdataq, deliverq, queues of messages, initially 


phbook(q) <— {qreg, now + ttl p i,,$) 


22 


Trajectories: 


Internal cleanPhbook(g) p 




satisfies 


Precondition: 


24 


d(now) = 1 


phbook(q)= {qreg, ttl, msg)A [(qreg = _1_A msg = 0) 




constant reg, phbook, sdataq, deliverq 


V (<jreg ^ _I_A [<U> now+ttl pb \/ msg ^ 0])V *U< now] 


26 


stops when 


Effect: 




Any precondition is satisfied. 


phbook(q) <— nuU 


28 


Actions: 


Input brcv(((rdata, m),p,u)) p 


30 


Output bcast(((sdata, m, q, qreg), p, reg)) p 


Effect: 




Precondition: 


if u £ {reg} U nbrs(reg) 


32 


(m, q, qreg) £ sdataq A reg ^ _L 
Effect: 


deliverq <— deliverq U {to} 


34 


sdataq <— sdataq \ {{m, q, qreg)} 


Output receive(m) p 
Precondition: 


36 


Input GPSupdate(t>, t) p 


m £ deliverq 




Effect: 


Effect: 


38 


now <— t 

if reg ^ i> then 


deliverq <— deliverq \ {m} 


40 


reg <— v 






Figure 10: EtoEComm' 


s Cjf 2E automaton. 



storing the location of g and setting a timeout for use of the location information. For each message waiting 
to be sent to q in queue phbook(q) .msg, the message, with the location information for the destination, 
is forwarded to p's current and neighboring regions' VSAs through a bcast(((sdata, m,q,qreg),p,u)) (lines 
59-60,30-34). 

Messages for client p from other clients arc received from p's current region or a neighboring region v's 
VSA through brcv(((rdata, m),p, v)) p (line 70). The message to is subsequently delivered through the output 
receive(m)p (line 75). 

6.2 EtoEComm VSA actions 

The signature, state, and actions of V^ 2E are in Figure 11. 

The receipt of a message to to be sent from a client p to q at qreg through brcv(((sdata, to, q, qreg),p, v)), 
v cither u or a neighbor (line 33) results in the subsequent forwarding of the message to the virtual automata 
at regions in calcregs(greg) and their neighboring regions, via the virtual automata communication action 
VtoVsend(greg, (data, m,q)) u (line 33-38). The set calcregs(gre(/) contains the regions that q could occupy 
by the time the message is delivered to it (since we do not require the client to be stationary during execution 
of the algorithm). As will be seen shortly, the definition of calcregs is dependent on assumptions about client 
mobility. 

Likewise, the receipt, via VtoVrcv((data, m,p)) u (line 40), of message to intended for client p results in 
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Signature: 


Actions: 


Input VtoVrcv((data, m,p)) u ,p £ P 


Output bcast(m) u 


Input brcv(((sdata, m,q,qreg),p,v)) u , p,q& P, qreg,vd U 


Precondition: 


Output bcast(m) u 


m £ bcastq 


Output VtoVsend(«, m) u ,v £ U 


Effect: 




bcastq <— bcastq \ {m} 


State: 




vtovq, a queue of tuples from U X msg, initially 


Output VtoVsend(j), m) u 


bcastq, a queue of messages, initially 


Precondition: 




{v, m) £ vtovq 


Trajectories: 


Effect: 


satisfies 


vtovq <— vtovq \ {{qreg, m)} 


constant vtovq, bcastq 




stops when 


Input brcv({(sdata, m,q,qreg),p,v)) u 


Any precondition is satisfied. 


Effect: 




if « 6 nbrs(u) U {«} then 


function calcregs(t): U): 2 U = 


let qregions = calcregs(qreg) in 


return nbrs(v) U {v} 


for each v £ qregions U nbrs(qregions) 




vtovq <— vtovq U {(qreg, (data,m, g}}} 




Input VtoVrcv({data, m,p)) u 




Effect: 




bcastq <— bcastq U {((rdata, m),p,«}} 


Figure 11: EtoEComm's V E2E automaton. 



the forwarding of the message to p via bcast(( (rdata, m),p,u)) u (line 42). 

6.3 Correctness 

We make the system assumptions described in Section 3. Correctness of the EtoEComm implementation 
is dependent on assumptions about client mobility and the definition of the function calcregs, used in the 
EtoEComm VSA algorithm. We can prove correctness under either of the following two conditions: 

(1) calcregs(gre<7) returns the set containing qreg and its neighbors, and each client remains in a region at 
least e S ampie + SttlvtoV + 5e + 4d + ttl p b time before moving to a neighboring region, or 

(2) calcregs(greg) returns the set containing qreg and each region v such that the supremum distance between 
any two points in v and qreg is at most v max ■ (e sam pie + SttlvtoV + 5e + 4d + ttlpb). 

We then outline correctness for EtoEComm under these assumptions. For the first lemma and theorem, 
assume we start in a safe configuration and no corruption failures occur. 
Lemma 6.1 Consider an alive client q such that some other client p has a non-null, non-1, entry for 

phbook(q).reg. If q does not fail for an additional 2d + 2e + ttlytoV time, then at any point in that interval, 
q will be located in a region in ca\cregs(phbook(q).reg). 

Proof sketch: First, we note that a non-null, non-_L entry phbook(q).reg has information that is at most 
^sample + ^ttlytoV + 3e + 2d out-of-date (from HLS) when it is first installed, after which it is saved for an 
additional ttlpb time. 

If we are assuming condition 1, client q must be in the region indicated, or a neighboring region, and 
will remain in those regions for an additional 2d + 2e + ttlvtoV time. If we are assuming condition 2, at 
any point up to 2d + 2e + ttlytoV later, client q can be in any region reachable from qreg in the total 
^sample + SttlytoV + 5e + 4d + ttl p b time, when traveling at speed v max . ■ 

Theorem 6.2 Consider a client p that performs a send(q, m), and does not change regions for ttluLS time. 
If client q has been in the system for ttlnLS + ^sample + d + ttlytoV + e time and does not fail, then q will 
perform a receive(m) within ttluLS + '2d+2e + ttlvtoV time. If a client receives a message, it must previously 
have been sent to it. 

Theorem 6.3 Starting from an arbitrary configuration, after HLS has stabilized, it takes ttl v b + 2d + 2e + 
ttlvtoV time for EtoEComm to stabilize. 

Proof sketch: Bad region information can be in phbook for up to ttl p b time, and messages sent using this 
information are not delivered and cleared until up to d+e + ttlvtoV + e + d later. At the same time, while HLS 
has been stabilizing, phbook's message collection can take up to Muls time to be cleared. The maximum 
of these quantities is the time for EtoEComm to stabilize. ■ 
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6.4 Extensions 

Here we briefly describe some possible extensions to our EtoEComm algorithm: 

Routing optimizations: Once the location of a client is known, communication with the client can be 

continued directly, and movements during the conversation may be piggy-backed on the information trans- 
ferred in order to update the destination according to the move (as suggested [12]). We also note that we 
can use an embedded tree location scheme such as the one in [12], implemented by virtual automata, where 
intermediate tree nodes are also mapped to regions. 

Sleeping client messaging service: Mobile clients might be able to shut down to conserve power. We 
could guarantee that a sleeping client eventually receives messages intended for it by having local VSAs save 
the messages. The VSAs then, at predefined times, broadcast the messages. Sleeping clients awake for these 
broadcasts, receive their messages, and can go to sleep again afterwards. 

7 Concluding remarks 

We described how both the GPS oracle and the VSA programming layer could help implement self-stabilizing 
geocast routing, location management, and end-to-end routing services. The self-stabilizing VSA layer 
provides a virtual fixed infrastructure useful for solving a variety of problems. It acts as a fault-tolerant, 
self-stabilizing building block for services, allowing applications to be built for mobile networks as though 
base stations existed for mobile clients to interact with. 

The GPS oracle's frequently refreshed and reliable timing and location information made providing self- 
stabilization easier. We believe the paradigm of an external service providing reliable information that can 
be used in a self-stabilizing service implementation is an especially important and relevant one in mobile 
networks. Mobile networks demonstrate many properties that naturally require self-stabilizing implemen- 
tations, such as a need for self-configuration, or the possibility of unpredictable kinds of failures, but also 
often have access to reliable external knowledge that can act as a source of shared consistency in the net- 
work; here, accurate region knowledge allowed nodes to determine who they should be communicating with 
(current region and neighboring region nodes), and time information allowed them to order messages and 
assess timeliness of information. 
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